Enable direct MX4→BF16 dequantization to reduce memory #3602

armandsauzay · 2025-12-09T19:54:03Z

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2202

Add output_dtype parameter to MX4 dequantization stack to support direct
conversion to BF16/FP16, avoiding expensive FP32 intermediate step.

Differential Revision: D87826479

meta-codesync · 2025-12-09T19:54:12Z

@armandsauzay has exported this pull request. If you are a Meta employee, you can view the originating Diff in D87826479.

Summary: X-link: meta-pytorch/torchrec#3602 X-link: facebookresearch/FBGEMM#2202 Add output_dtype parameter to MX4 dequantization stack to support direct conversion to BF16/FP16, avoiding expensive FP32 intermediate step. Differential Revision: D87826479

…3602) Summary: X-link: pytorch/FBGEMM#5206 X-link: facebookresearch/FBGEMM#2202 Add output_dtype parameter to MX4 dequantization stack to support direct conversion to BF16/FP16, avoiding expensive FP32 intermediate step. Differential Revision: D87826479

Summary: X-link: meta-pytorch/torchrec#3602 X-link: facebookresearch/FBGEMM#2202 Add output_dtype parameter to MX4 dequantization stack to support direct conversion to BF16/FP16, avoiding expensive FP32 intermediate step. Differential Revision: D87826479

…3602) Summary: X-link: pytorch/FBGEMM#5206 X-link: facebookresearch/FBGEMM#2202 Add output_dtype parameter to MX4 dequantization stack to support direct conversion to BF16/FP16, avoiding expensive FP32 intermediate step. Differential Revision: D87826479

Summary: X-link: meta-pytorch/torchrec#3602 X-link: facebookresearch/FBGEMM#2202 Add output_dtype parameter to MX4 dequantization stack to support direct conversion to BF16/FP16, avoiding expensive FP32 intermediate step. Differential Revision: D87826479

…3602) Summary: X-link: pytorch/FBGEMM#5206 X-link: facebookresearch/FBGEMM#2202 Add output_dtype parameter to MX4 dequantization stack to support direct conversion to BF16/FP16, avoiding expensive FP32 intermediate step. Differential Revision: D87826479

Summary: X-link: meta-pytorch/torchrec#3602 X-link: facebookresearch/FBGEMM#2202 Add output_dtype parameter to MX4 dequantization stack to support direct conversion to BF16/FP16, avoiding expensive FP32 intermediate step. Differential Revision: D87826479

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 9, 2025

meta-codesync bot added fb-exported meta-exported labels Dec 9, 2025

armandsauzay mentioned this pull request Dec 9, 2025

Enable direct MX4→BF16 dequantization to reduce memory pytorch/FBGEMM#5206

Open

armandsauzay force-pushed the export-D87826479 branch from ac679b2 to 12322c3 Compare December 10, 2025 21:27

armandsauzay force-pushed the export-D87826479 branch from 12322c3 to 005949f Compare December 10, 2025 22:19

armandsauzay force-pushed the export-D87826479 branch from 005949f to 5841092 Compare December 10, 2025 22:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable direct MX4→BF16 dequantization to reduce memory #3602

Enable direct MX4→BF16 dequantization to reduce memory #3602

armandsauzay commented Dec 9, 2025

Uh oh!

meta-codesync bot commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Enable direct MX4→BF16 dequantization to reduce memory #3602

Are you sure you want to change the base?

Enable direct MX4→BF16 dequantization to reduce memory #3602

Conversation

armandsauzay commented Dec 9, 2025

Uh oh!

meta-codesync bot commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant